Splitting Methods for Decision Tree Induction:a Comparison of Two Families
نویسندگان
چکیده
Decision tree (DT) induction is among the more popular of the data mining techniques. An important component of DT induction algorithms is the splitting method, with the most commonly used method being based on the conditional entropy family. However, it is well known that there is no single splitting method that will give the best performance for all problem instances. In this paper we explore the relative performance Conditional Entropy family and another family that is based on the Class-Attribute Mutual Information (CAMI) measure. Our results suggest that while some datasets are insensitive to the choice of splitting methods, other datasets are very sensitive to the choice of splitting methods. For example, some of the CAMI family methods may be more appropriate than GainRatio (GR) for datasets where all non-class attributes are nominal; some of the CAMI methods perform as well as GR for datasets where all the non-class attributes are either integer or continuous. Given the fact that it is never known beforehand which splitting method will lead to the best DT for the given dataset, and given the relatively good performance of the CAMI methods, it seems appropriate to suggest that splitting methods from the CAMI family should be included in data mining toolsets.
منابع مشابه
Non-parametric classification of protein secondary structures
Proteins were classified into their families using a classification tree method which is based on the coefficient of variations of physico-chemical and geometrical properties of the secondary structures of proteins. The tree method uses as splitting criterion the increase in purity when a node is split into two subnodes and the size of the tree is controlled by a threshold level for the improve...
متن کاملThe Comparison of Gini and Twoing Algorithms in Terms of Predictive Ability and Misclassification Cost in Data Mining: An Empirical Study
The classification tree is commonly used in data mining for investigating interaction among predictors, particularly. The splitting rule and the decision trees technique employ algorithms that are largely based on statistical and probability methods. Splitting procedure is the most important phase of classification tree training. The aim of this study is to compare Gini and Twoing splitting rul...
متن کاملComparison of gestational diabetes prediction with artificial neural network and decision tree models
Background: Gestational diabetes mellitus (GDM) is one of the most common metabolic disorders in pregnancy, which is associated with serious complications. In the event of early diagnosis of this disease, some of the maternal and fetal complications can be prevented. The aim of this study was to early predict gestational diabetes mellitus by two statistical models including artificial neural ne...
متن کاملFamilies of splitting criteria for classification trees
Several splitting criteria for binary classification trees are shown to be written as weighted sums of two values of divergence measures. This weighted sum approach is then used to form two families of splitting criteria. One of them contains the chi-squared and entropy criterion, the other contains the mean posterior improvement criterion. Both family members are shown to have the property of ...
متن کاملComparison of Decision Tree and Naïve Bayes Methods in Classification of Researcher’s Cognitive Styles in Academic Environment
In today world of internet, it is important to feedback the users based on what they demand. Moreover, one of the important tasks in data mining is classification. Today, there are several classification techniques in order to solve the classification problems like Genetic Algorithm, Decision Tree, Bayesian and others. In this article, it is attempted to classify researchers to “Expert” and “No...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013